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BACKGROUND 


SRA  used  a  language-independent,  domain-independent,  multipurpose  text  understanding  system  as  the  core 
of  the  MUC-5  system  for  extraction  from  English  and  Japanese  joint  venture  texts.  SRA’s  NLP  core  system, 
SOLOMON,  has  been  under  development  since  1986.  It  has  been  used  for  a  variety  of  domains,  and  was 
aimed  from  the  start  to  be  language-independent,  domain-independent,  and  application-independent.  More 
recently,  SOLOMON  has  been  extended  to  be  multilingual,  beginning  with  Spanish  in  1990  and  Japanese  in 
1991.  The  Spanish- Japanese  text  understanding  system  that  uses  SOLOMON  was  developed  for  a  domain 
very  different  from  the  MUC-5  joint  venture  domain  (cf.  Aone,  et  al.  [2]). 

SOLOMON’S  principal  applications  have  been  in  data  extraction,  but  it  is  also  used  in  a  prototype 
machine  translation  system  (cf.  Aone  and  McKee  [5]).  The  domain  areas  in  which  SOLOMON  applications 
have  been  developed  are:  financial,  terrorism,  medical,  and  the  MUC-5  joint- venture  domain.  SRA  has 
significantly  enhanced  its  capability  to  add  new  domains  and  languages  by  developing  new  strategies  for 
data  acquisition  using  both  statistical  techniques  and  a  variety  of  user-friendly  tools. 


MUC-5  SYSTEM  ARCHITECTURE 


SOLOMON  employs  a  modular,  data-driven  architecture  to  achieve  its  language-  and  domain-independence. 
The  MUC-5  system,  which  uses  SOLOMON  as  a  core  engine,  consists  of  seven  processing  modules  and 
corresponding  data  modules,  as  shown  in  Figure  1,  which  will  be  described  in  the  following  sections. 


Message  Zoner 

The  Message  Zoner  uploads  the  SGML- annotated  text  file  into  the  data  extraction  system.  Input  files  are 
assumed  to  have  been  proprocessed  so  that  they  contain  only  “rigorous  markup”  (cf.  Goldfarb  [8])  SGML 
tags  and  text;  however,  we  do  not  require  sentences  or  paragraphs  to  be  tagged.  Japanese  text  is  assumed 
to  be  encoded  in  EUC,  but  tags  must  be  ASCII. 

All  input,  including  tags,  is  tokenized  using  a  simple,  language-independent,  regular  expression  recognizer. 
The  (multi-word)  tokens  are  parsed  into  sentences,  paragraphs,  headers  and  documents  using  a  simple 
operator-precendence  grammar  (cf.  Aho,  Sethi  and  Ullman  [1])  operating  on  punctuation  and  tags.  The 
tokenizer  and  parser  are  written  entirely  in  lex. 
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Figure  1:  MUC-5  System  Architecture 


Sentence  and  paragraph  boundries  are  inferred  using  a  conservative  algorithm  and  marked  as  inferred. 
Inference  is  not  performed  if  sentences  and  paragraphs  are  rigorously  marked.  The  output  is  piped  to  a 
post-processor,  which  does  a  fast  lookup  of  each  word  in  a  btree  gazetteer,  and  includes  entry  information 
in  the  tokens  of  place  names. 


Preprocessing 


Preprocessing  consists  of  two  processors,  the  morphological  analyzer  and  the  pattern  matcher,  and  associated 
data  in  the  form  of  morphological  data,  lexicons,  and  patterns  for  each  language.  Its  input  is  a  tokenized 
message,  and  its  output  is  a  series  of  lexical  entries  with  syntactic  and  semantic  attributes. 

Declarative  morphological  data  for  inflection-rich  Japanese  and  Spanish  is  compiled  into  finite-state 
machines.  The  English  domain  lexicon  was  derived  from  development  texts  automatically,  using  a  statistical 
technique  (cf.  McKee  and  Maloney  [10]).  This  derived  lexicon  also  contains  automatically  acquired  domain- 
specific  subcategorization  frames  and  predicate-argument  mapping  rules  called  situation  types  (cf.  Aone  and 
McKee  [3]),  as  shown  in  Figure  2. 

Pattern  recognition  handles  a  wide  range  of  phenomena,  including  multi-words,  numbers,  acronyms, 
money,  date,  person  names,  locations,  and  organizations.  We  extended  the  Pattern  matcher  to  handle  multi¬ 
level  pattern  recognition.  The  pattern  data  are  divided  into  ordered  multiple  groups  called  priority  groups , 
and  the  patterns  in  each  group  are  fired  sequentially,  avoiding  recursive  applications  as  much  as  possible. 
This  extension  speeded  up  the  performance  of  Preprocessing  significantly. 


Syntactic  Analysis 

The  processor  for  Syntactic  Analysis  is  a  parser  based  on  Tomita’s  algorithm  (cf.  Tomita  [11]),  with  modifi¬ 
cations  for  disambiguation  during  parsing.  Syntactic  Analysis  data  consist  of  X-bar  based  phrase  structure 
grammars  and  preparse  patterns  for  each  of  the  three  languages,  English,  Japanese,  and  Spanish.  Syntactic 
Analysis  outputs  F-structures  (grammatical  relations),  along  the  lines  of  Lexical-Functional  Grammar  (cf. 
Bresnan  [7]),  as  shown  in  Figure  3.  The  Semantic  Interpretation  module  is  interleaved  for  disambiguation 
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(SHIM 

((CATEGORY  .  V) 

(IDIOSYICRACIES  (THEME  (HAPPIIG  (LITERAL  UITH))))  ;  "swim  with  the  big  fish" 
(OCCS  11) 

(PREDICATE  AIIHATE-OBJECT -ACTIVITY) 

(SITUATIOI-TYPE  ACTIVITY))) 

(STEP 

((CATEGORY  .  V) 

(IDIOSYICRACIES  (SOURCE  (HAPPIIG  (LITERAL  FROH) ) ) 

(GOAL  (HAPPIIG  (LITERAL  01  IITO)))) 

(OCCS  36) 

(PREDICATE  CHAIGIIG-EVEIT) 

(PROB  8.1  .  1) 

(SITUATIOI-TYPE  ACTIVITY))) 

(TEAK 

((CATEGORY  .  V) 

(IDIOSYICRACIES  (THEME  (HAPPIIG  (LITERAL  WITH)))) 

(OCCS  31) 

(PREDICATE  AIIHATE-OBJECT-ACTIVITY) 

(SITUATIOI-TYPE  PROCESS  -CAUSED-PROCESS ) ) ) 

(SWITCH 

((CATEGORY  .  V) 

(IDIOSYICRACIES  (SOURCE  (HAPPIIG  (LITERAL  FROM)))) 

(OCCS  161) 

(PREDICATE  TURIKEY-CHAIGE ) 

(PROB  2.1  .  1) 

(SITUATIOI-TYPE  CAUSED-PROCESS))) 


Figure  2:  Statistically  Acquired  Lexical  Entries 


of  prepositional  phrase  attachment,  conjunctions,  and  so  on,  by  calling  semantic  functions,  which  are  shared 
by  all  three  languages,  from  inside  the  grammar. 

Preparsing  takes  the  burden  off  of  main  parsing  and  increases  accuracy,  by  recognizing  structures  such  as 
sentential  complements,  appositives,  certain  PP’s,  etc.  by  pattern  matching,  and  sending  these  to  the  parser 
as  chunks.  These  preparse  chunks  are  parsed  prior  to  main  parsing  using  the  same  grammars,  and  their 
output  consists  of  F-structures  as  well. 


•  Appositives:  “industry’s  largest  Tokyo  Kaijou” 

•  Sentences  with  certain  verb  endings: 

IM  fcfcHWMtC  11  -5S8EL  W8 Iti o .  1 

•  PP’s:  start  production  [in  january  1990 ]  with  production  of  20,000  iron 


In  order  to  test  the  progress  of  grammar  development  and  pinpoint  trouble  spots,  automatic  evaluation 
of  grammars  was  used.  SRA  adapted  the  community-wide  program  Parseval  (cf.  Black,  et  al.  [6])  for  use 
in  Japanese  in  addition  to  English.  Testing  on  Japanese  was  limited,  since  there  are  not  many  bracketed 
Japanese  texts  to  use  as  answer  keys. 


Semantic  Interpretation 


Semantic  Interpretation  uses  a  language-independent  processing  module,  and  its  data  are  predicate-argument 
mapping  rules  for  each  verb,  plus  both  core  and  domain  knowledge  bases.  Semantic  Interpretation  works 
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BRIDGESTONE  SPORTS  CO.  SAID  FRIDAY  IT  HAS  SET  UP  A  JOIST  VESTURE  IS  TAIUAH  WITH  A  LOCAL  COBCERB  AID 
A  JAPASESE  TRADING  HOUSE  . . . 

[ST:  <S> 

SUBJECT:  [ST:  <BP> 

HEAD:  BRIDGESTOHE-SPORTS-CO.] 

ADJUNCTS :  ([ST:  <NP> 

HEAD:  FRIDAY]) 

PREDICATE:  [ST:  <VP> 

TENSE:  PAST 

PREDICATE:  (COMMUNICATE) 

ROOT:  SAY 

SENT-COMP:  [ST:  <S> 

SUBJECT:  [ST:  <HP> 

HEAD:  IT] 

PREDICATE:  [ST:  <VP> 

TENSE :  PRESENT 
ASPECT :  PERFECT 
PREDICATE:  (CREATE) 

ROOT:  SET 
VERB-PARTICLE:  UP] 

OBJECT:  [ST:  <NP> 

HEAD:  A-JOINT-VENTURE] 

PREP-ARGS:  ([ST:  <PP> 

MARKED:  WITH 

HEAD :  A-LOCAL-CONCERN-AND-A- JAPANESE-TRADISG-HOUSE] ) 

ADJUNCTS:  ([ST:  <PP> 

MARKED:  IB 
HEAD:  TAIWAN] )]]] 


Figure  3:  Simplified  F-Structure  Output  by  Syntactic  Analysis 


oir  of  language-neutral  F-structures  in  order  to  handle  all  the  languages.  It  outputs  semantic  structures,  i.e. 
predicate-argument  and  modification  relations,  as  shown  in  Figure  4.  The  predicate-argument  mapping  rules 
(i.e.  rules  which  map  F-structures  to  semantic  structures)  are  acquired  automatically  (cf.  Aone  and  McKee 
[3]).  Domain  knowledge  bases,  on  the  other  hand,  were  acquired  manually.  However,  a  new  rapid  knowledge 
acquisition  tool  called  KATool  was  used  to  link  a  lexical  entry  to  its  corresponding  semantic  concept  in  the 
knowledge  bases  (cf.  Figure  5). 

If  a  full  parse  cannot  be  created,  SOLOMON  uses  a  fragment  combination  strategy.  Debris  Parsing 
and  its  subsequent  process,  Debris  Semantics,  work  together  to  obtain  the  best  interpretation  from  sentence 
fragments.  They  use  as  data  the  grammars  and  knowledge  bases,  and  they  output  semantic  structures  just 
like  when  a  full  parse  is  created.  Debris  Parsing  retrieves  the  largest  and  most  preferred  constituents  from 
the  parse  stack.  It  then  reparses  the  rest  of  the  input,  and  creates  debris  F-structures  with  the  best  fragment 
constituents.  Debris  Semantics  relies  on  the  semantic  interpreter  to  process  each  fragment,  and  then  fits 
fragments  together  using  semantic,  constraints  on  unfilled  slots. 


Discourse  Analysis 

Discourse  Analysis,  which  was  redesigned  and  implemented  this  year  (cf.  Aone  and  McKee  [4]),  performs 
reference  resolution.  Discourse  Analysis  uses  a  data-driven  architecture  to  achieve  language-independence, 
domain-independence,  and  extensibility.  It  employs  a  single  language-independent,  domain-independent 
processor,  and  several  discourse  knowledge  bases,  some  of  which  are  shared  among  different  languages.  The 
output  of  Discourse  Analysis  is  a  set  of  semantic  structures  with  coreference  links  added,  i.e.  File  Cards 
(cf.  Heim  [!)]).  Discourse  phenomena  handled  for  the  joint  venture  domain  include  name  anaphora  (e.g. 
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BRIDGESTOIE  SPORTS  CO.  SAID  FRIDAY  IT  HAS  SET  UP  A  JOIHT  VEITURE  II 
TAIHAI  WITH  A  LOCAL  COICERI  AID  A  JAPAIESE  TRADIIG  HOUSE  . . . 

(COHHUIICATE-1176  (ISA  (VALUE  (COHHUIICATE) ) ) 

(TIHE  (VALUE  (FRIDAY-1178))) 

(AGEIT  (VALUE  (COHPAIY-1146) ) ) 

(THEME  (VALUE  (CREATE-1163) ) ) 

(TEISE  (VALUE  (PAST)))) 

(COHPAIY-1146  (ISA  (VALUE  (COHPAIY))) 

(QUAITITY  (VALUE  ((EXACT  1)))) 

(UIIT  (VALUE  (IATURAL-UIIT) ) ) 

(IAMES  (VALUE  ((BRIDGESTOIE  SPORTS  CO))))) 

(CREATE-1163  (ISA  (VALUE  (CREATE))) 

(LOCATIOI  (VALUE  (COUITRY-1144) ) ) 

(AGEIT  (VALUE  (THUG-1166))) 

(THEME  (VALUE  (TIE-UP-EVEIT-1164) )) 

(CO-THEME  (VALUE  (COIJOIIED-COLLECTIOI  COHPAIY) -1 172 ) ) 

(ASPECT  (VALUE  (PERFECT))) 

(TEISE  (VALUE  (PRESEIT) )')  ) 

((COIJOIIED-COLLECTIOI  COHPAIY) -1172 

(ISA  (VALUE  ((AID  COIJOIIED-COLLECTIOI  COHPAIY)))) 

(HAS-HEHBERS  (VALUE  (C0HPAIY-1170  COHPAIY-1168) )) ) 

(COHPAIY-1168  (ISA  (VALUE  (COHPAIY))) 

(QUAITITY  (VALUE  ((EXACT  1)))) 

(UIIT  (VALUE  (IATURAL-UIIT))) 

(LOCATIOI  (TYPE  (AID  T  PHYSICAL-LOCATIOI) )  (VALUE  (LOCAL)))) 
(C0HPAIY-1170  (ISA  (VALUE  (COHPAIY))) 

(QUAITITY  (VALUE  ((EXACT  1)))) 

(UIIT  (VALUE  (IATURAL-UIIT))) 

(IATIOIALITY  (VALUE  (JAPAI)))) 

(COUITRY-1144  (ISA  (VALUE  (COUITRY))) 

(EIGLISH-GAZ- STRUG  (VALUE  (Tainan  (COUITRY))))) 


Figure  4:  Semantic  (Predicate-Argument)  Structure 


Figure  5:  Knowledge  Acquisition  Tool 
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DISCOURSE:  Classified  #<DISCOURSE-HARKER  DISC0URSE-BARKER-18lX"BRIDGEST0«E  SPORTS")  as  DP-IAHE 
DISCOURSE:  Found  an  exact  match, 

ante:  #<DISCOURSE-HARKER  DISC0URSE-HARKER-83X"BRIDGEST0IE  SPORTS  CO.") 
ref:  *<DISCOURSE-MARKER  DISC0URSE-MARKER-18lX"BRIDGESTDIE  SPORTS") 

DISCOURSE:  Classified  #<DISCOURSE-HARKER  DISC0URSE-HARKER-2O6>("BRIDGEST0IE  SPORTS")  as  DP-IAME 
DISCOURSE:  Found  an  exact  match, 

ante:  #<DISCOURSE-HARKER  DISCOURSE-MARKER-18lX"BRIDGESTOHE  SPORTS") 
ref:  #<DISCOURSE-HARXER  DISC0URSE-HARKER-2O6>("BRIDGEST0IE  SPORTS") 


Figure  6:  English  Discourse  Trace  Example 


DISCOURSE:  Classified  #<DISCOURSE-MARKER  DISCOURSE-MARKER-511>("^^S^^<7)^}§±")  as  DP-NAME 
DISCOURSE:  Found  an  exact  match, 

ante:  #<DISCOURSE-MARKER  DISCOURSE-MAR KER-248>("^r^_hj^ffl^'') 
ref:  #<DISCOURSE-MARKER  DISCOURSE-MARKER-51 


DISCOURSE:  Classified  #<D1SC0URSE-MARKER  DISCOURSE-MAR KER-573>("K^lSil")  as  DP-NAME 
DISCOURSE:  Found  an  exact  match, 
ante:  #<DISCOURSE-MARKER  DISCOURSE-MARKER-51 
ref:  #<DlSCOURSE-MARKER  DISCOURSE-M ARKER-573>("^^§J:") 

Figure  7:  Japanese  Discourse  Trace  Example 


"BRIDGESTONE  SPORTS”  for  ’’BRIDGESTONE  SPORTS  CIO.”)  and  definite  NP’s  such  as  “THE  NEW 
COMPANY". 

The  system  traces  for  English  and  Japanese  walkthrough  examples  are  shown  in  Figure  6  and  Figure  7. 
In  the  English  example,  the  two  instances  of  name  anaphora  for  “Bridgestone  Sports  Co.”  are  recognized, 
while  in  the  Japanese  example,  all  the  references  to  “Tokyo  Kaijou  Kasai  Hoken,”  including  appositives,  are 
resolved. 


Pragmatic  Inferencing 

Pragmatic  Inferencing  performs  reasoning  in  order  to  derive  implicit  information  from  the  text,  using  a 
forward  chainer  and  inference  rules.  Pragmatic  Inferencing  outputs  semantic  structures,  with  inferred  infor¬ 
mation  added.  It.  infers  additional  information  from  “literal”  meanings  as  required  for  application  domains. 
For  instance,  in  the  walkthrough  example,  in  order  to  infer  “THE  TAIWAN  UNIT”  is  a  joint  venture 
company  from  the  phrase  “THE  ESTABLISHMENT  OF  THE  TAIWAN  UNIT”  the  following  rule  is  used. 


(defrule  rule-0009  ((?event)  (?event)) 

lexainple  ("PNI  and  SRA  established  a  new  company.") 
:if  (and  (establish  ?event) 

(theme  ?event  ?x) 

(company  ?x)) 

:then  (and  (tie-up-event  ?event) 

(joint-venture-company  ?x) 
(joint-venture-company  ?event  ?x) 
(in-jv-event  ?x  ?event))) 
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it  is  easy  for  developers  to  add,  change  or  remove  inferred  information  due  to  the  declarative  nature  of 
the  inference  rules.  For  instance,  to  get  an  additional  tie-up  from  “Company  A  and  Company  B  tied  wit  h 
Company  C”,  in  jjv-0002,  we  just  had  to  add  another  rule  to  infer  that  when  companies  ’’tie,”  they  form  a 
tie-up. 


(defrule  rule-0017b  ((?event)  (?event)) 
: example  ("PNI  tied  with  SRA") 

:if  (and  (tie-event  ?event) 

(not  (theme  ?event  ?z)) 
(agent  ?event  ?x) 
(company  ?x) 

(co-theme  ?event  ?y) 
(company  ?y)) 

:then  (tie-up-event  ?event)) 


Extract 

The  Extract  module  performs  template  generation,  translating  the  domain-relevant  portions  of  our  language- 
independent  semantic  structures  into  database  records.  We  maintain  a  strong  distinction  between  processing 
and  data  even  in  template  generation.  Thus,  we  use  the  same  processing  module  to  output  in  different 
languages  and  to  several  database  schemata,  including  to  a  flat  template-style  schema  as  in  MUC-4  and  to 
a  more  object-oriented  schema  as  in  MUC-5. 

To  do  the  actual  template  filling,  we  rely  on  Extract  data  made  up  of  kb-object/slot  to  db-table/field 
mapping  rules  and  conversion  functions  for  the  individual  values  (e.g.  set  fills,  string  fills).  For  example,  the 
#nationality  slot  of  an  ^ORGANIZATION  object  in  our  knowledge  base  corresponds  to  the  Nationality 
field  of  the  Entity  object  in  the  MUC-5  template. 


REUSABILITY  OF  THE  SYSTEM 

SOLOMON  is  designed  for  reusability.  Each  processing  module  is  data-driven  and  reusable  in  other  lan¬ 
guages  and  other  domains,  as  well  as  in  applications  other  than  data  extraction  (e.g.  machine  translation, 
abstracting,  summarization).  A  large  portion  of  the  data  is  also  reusable  in: 

•  Other  languages  and  domains 

-  Core  knowledge  bases 

•  Other  domains 

—  Morphological  data 

-  General  lexicons 

-  General  pattern  data  (e.g.  date,  location,  personal  name,  organization  name) 

—  Grammars 

—  Some  of  the  discourse  knowledge  sources 

•  Other  languages 

—  Domain  knowledge  bases 
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Figure  8:  Reusability  of  SRA’s  MUC-5  System 


—  Some  of  the  discourse  knowledge  sources 

—  Inference  rules 

—  Extract  (template  generation)  data 


The  data  acquisition  tools  and  techniques  are  also  reusable  in  other  languages  and  domains.  The  statis¬ 
tical  techniques  used  to  derive  lexical  information  can  be  reused  for  other  domains.  LEXTool,  the  lexicon 
acquisition  tool,  is  multilingual  and  relies  on  system  data  files  for  category  and  morphological  informa¬ 
tion.  KBTool,  the  knowledge  base  acquisition  tool,  is  language-independent  just  as  the  knowledge  bases  are 
language-independent.  KATool,  the  knowledge  acquisition  tool  that  links  lexicon  entries  with  the  appropri¬ 
ate  knowledge  base  concepts,  is  entirely  data-driven  as  well,  and  is  therefore  completely  reusable.  Figure  8 
summarizes  the  reusability  of  SRA’s  MUC-5  system. 


TEST  RESULTS  AND  ANALYSIS 


Our  MUC-5  results  for  the  English  and  Japanese  joint- venture  domain  task  are  shown  in  Table  1.  We  spent 
10.55  person-months  for  this  task,  most  of  which  were  devoted  to  data  development  for  both  languages  (see 
Table  2).  The  “other”  category  includes  time  spent  on  developing  language-independent  data  such  as  a 
joint-venture  domain  knowledge  base,  pragmatic  inference  rules,  and  Extract  data  for  template  generation. 


We  believe  that  the  results  do  not  indicate  the  potential  of  our  system,  since  the  system  performance  for 
both  languages  was  still  improving  after  five  months  of  development.  Much  of  the  work  we  did  resulted  in 
long-term  improvements  to  our  overall  text  understanding  capability,  all  of  which  will  ensure  a  stronger  base 
system  for  future  applications.  This  implies  that  although  the  development  cycle  for  data  extraction  system 
using  a  text  understanding  system  may  be  slower  in  its  current  maturity  stage,  the  potential  for  such  a  system 
is  still  unknown  and  represents  a  most  promising  avenue  for  development.  We  are  particularly  pleased  with 
the  success  of  our  Japanese  system:  no  other  Japanese  MUC-5  site  is  using  the  full  understanding  approach, 
but  we  did  as  well  and  our  performance  continues  to  improve.1 

Staff  time  was  the  major  limiting  factor.  We  needed  more  time  to  perform  more  testing  and  evaluation 
1  111  the  18-nionth  Tipster  evaluation,  the  highest  JJV  F-measure  was  about  40. 
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English 

ERR 

UND 

ova 

SUB 

RECI 

PRE 

ALL  OBJECTS 

80 

66 

26 

34 

22 

49 

MATCHED  ONLY 

48 

•28 

8 

23 

56 

71 

TEXT  FILTERING 

- 

25 

7 

- 

74 

9.3 

p&r 

2P&R 

Pfc2R 

F-MEASURE 

30.80 

39.56 

25.22 

Japanese 

ERR 

UND 

OVG 

SUB 

REC 

PRE 

ALL  OBJECTS 

70 

53 

34 

20 

38 

52 

MATCHED  ONLY 

43 

28 

9 

14 

61 

78 

TEXT  FILTERING 

- 

6 

1 

- 

94 

98 

P&R 

2P&R 

P&2R 

F-MEASURE 

43.92 

48.74 

39.97 

Table  1:  SRA’s  Scores  for  the  English  and  Japanese  Joint  Venture  Domain 


task 

person-months 

EJV 

3.2 

JJV 

2.2 

Testing 

1.5 

Documentation 

0.25 

Other 

3.4 

Table  2:  SRA’s  Time  Expenditure  for  MUC-5 


using  the  scoring  program,  and  to  finely  tune  Extract  (template  generation)  mapping  rules.  We  discovered 
we  were  hampered  by  formatting  errors,  and  in  addition  considerable  information  was  “understood”  by  the 
system  all  the  way  through,  but  was  not  extracted  by  the  template  generator.  Since  the  discourse  module 
was  new,  it  would  have  been  helpful  to  have  additional  time  to  test  and  expand  it.  In  addition,  we  needed 
more  time  to  fill  the  OWNERSHIP,  REVENUE,  and  TIME  objects,  which  we  simply  did  not  output. 


CONCLUSION 


Overall,  the  data-driven  architecture  in  SOLOMON  allowed  for  minimum  work  on  processing  modules  when 
working  on  different  languages  and  domains.  We  ported  the  system  to  Spanish  in  a  week  for  the  demonstra¬ 
tion  given  at  the  MUC-5  conference. 

Although  we  successfully  acquired  large  amounts  of  domain  data  from  domain  texts  in  both  languages, 
using  both  statistical  methods  and  newly  developed  user-friendly  knowledge  acquisition  tools,  we  recognize 
the  need  to  move  even  more  quickly  to  new  domains  and  languages.  We  plan  to  continue  our  work  on 
automatic  acquisition  of  lexicons,  knowledge  bases,  and  links  between  them  in  multiple  languages. 

Tuning  performance  of  each  module  (e.g.  parsing,  discourse  analysis)  as  well  as  the'  performance  of 
the  whole  system  to  a  particular  task  more  rapidly  is  another  research  issue  we  identified.  We  believe  that 
developing  automatic  evaluation  and  training  algorithms  for  such  automated  module/system  tuning  is  crucial 
to  develop  a  data  extraction  system  that  produces  optimal  results. 
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APPENDIX 


A  ejv-0592  SRA’s  Original  Response 


<TENPLATE-0S92-1>  := 

DOC  HR:  0592 
DOC  DATE:  241189 

DOCUHEIT  SOURCE:  "Jiji  Press  Ltd.;" 
COHTEHT :  <TIE_UP_RELATI0HSHIP-O592-3> 
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<TIE_UP_RELATI0HSHIP-0S92-2> 
<TIE_UP_RELATI0BSHIP-0592-2>  := 

TIE-UP  STATUS:  EXISTIBG 
EHTITY :  <EBTITY-0592-6> 

<EBTITY-0592-5> 

JOIST  VEBTURE  CO:  <EBTITY-0592-7> 

ACTIVITY:  <ACTIVITY-0592-8> 

<ACTIVITY-0592-8>  := 

IBDUSTRY :  <IBDUSTRY-0592-9> 

ACTIVITY-SITE:  (Tainan  (COUBTRY)  <EBTITY-0592-10» 
<IHDUSTRY-0S92-9>  := 

IBDUSTRY-TYPE :  PRODUCTIOB 
PRODUCT/SERVICE:  (67  "A  JOIST  VEBTURE") 
<EBTITY-0592-5>  := 

BAKE:  Taga  CO 
TYPE:  COHPABY 

EBTITY  RELATIOBSHIP :  <EHTITY_RELATI0HSHIP-O592-U> 
<EBTITY_RELATI0BSHIP-0S92-11>  := 

EBTITY 1 :  <EHTITY-0592-5> 

<EBTITY-0592-6> 

EBTITY2 :  <EBTITY-0592-7> 

REL  OF  EBTITY2  TO  EBTITY1 :  CHILD 
STATUS:  CURREBT 
<EHTITY-0592-6>  := 

BARE:  Union  Precision  Casting  CO 
ALIASES:  "Union  Precision  Casting" 

TYPE:  COHPAHY 

EBTITY  RELATIOBSHIP:  <EBTITY_RELATI0BSHIP-0592-ll> 
<EBTITY-0592-7>  := 

BATIOHALITY :  Tainan  (COUBTRY) 

TYPE:  COHPABY 

EBTITY  RELATIOBSHIP:  <EBTITY_RELATIOHSHIP-0592-11> 
<TIE_UP_RELATI0BSHIP-O592-3>  := 

TIE-UP  STATUS:  EXISTIBG 
EBTITY:  <EBTITY-0592-14> 

<EBTITY-0592-13> 

ACTIVITY:  <ACTIVITY-0592-8> 

<EBTITY-0592-13>  := 

BARE:  Bridgestone  Sports  CO 
ALIASES:  "Bridgestone  Sports" 

TYPE:  COHPABY 

EBTITY  RELATIOBSHIP:  <EBTITY_RELATI0BSHIP-0592-lS> 
<EBTITY_RELATI0BSHIP-0592-15>  := 

EHTITY 1 :  <EHTITY-0592-13> 

<EBTITY-0592-14> 

REL  OF  EBTITY2  TO  EBTITY1 :  PARTHER 
STATUS:  CURREBT 
<EHTITY-0S92-14>  := 

TYPE:  COHPABY 

EBTITY  RELATIOBSHIP:  <EBTITY_RELATIOBSHIP-OS92-1S> 


B  ejv-0592  SRA’s  Corrected  Response 


<TEHPLATE-0592-l>  := 

DOC  HR:  0592 
DOC  DATE:  241189 

DOCUHEHT  SOURCE:  "Jiji  Press  Ltd.;" 
COHTEHT :  <TIE_UP_RELATI0HSHIP-0S92-4> 
<TIE_UP_RELATI0HSHIP-O592-3> 
<TIE_UP_RELATI0HSHIP-O592-2> 
<TIE_UP_RELATI0HSHIP-0592-2>  := 

TIE-UP  STATUS:  EXISTIBG 
EHTITY :  <EHTITY-0592-7> 
<EHTITY-0592-6> 
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JOINT  VENTURE  CO :  <EHTITY-0592-8> 

ACTIVITY:  <ACTIVITY-0592-9> 

<ACTIVITY-0592-9>  := 

INDUSTRY:  <INDUSTRY-0592-10> 

ACTIVITY-SITE:  (-  <ENTITY-0592-ll» 

< INDUSTRY -0592- 10>  := 

INDUSTRY-TYPE:  PRODUCTION 
PRODUCT/SERVICE:  (67  "A  JOINT  VENTURE”) 
<EHTITY-0S92-6>  := 

NAME:  Taga  CO 
TYPE:  COMPANY 

ENTITY  RELATIONSHIP:  <ENTITY_RELATIONSHIP-0592-12> 
<ENTITY_RELATIONSHIP-0592-12>  := 

ENTITY1 :  <ENTITY-0592-6> 

<ENTITY-0592-7> 

ENTITY2 :  <ENTITY-0592-8> 

REL  OF  ENTITY2  TO  ENTITY1 :  CHILD 
STATUS:  CURRENT 
<ENTITY-0592-7>  := 

NAME:  Bridgestone  Sports  CO 
Bridgestone  Sports 
TYPE:  COMPANY 

ENTITY  RELATIONSHIP:  <ENTITY_RELATI0NSHIP-O592-12> 
<ENTITY-0592-8>  :* 

NAME:  Bridgestone  Sports  Taiwan  CO 
ALIASES:  "Bridgestone  Sports  CO" 

"Bridgestone  Sports" 

NATIONALITY:  Taiwan  (COUNTRY) 

TYPE:  COMPANY 

ENTITY  RELATIONSHIP:  <ENTITY_RELATI0NSHIP-O592-12> 
<TIE_UP_RELATIONSHIP-0592-3>  := 

TIE-UP  STATUS:  EXISTING 
ENTITY:  <ENTITY-0S92-16> 

<ENTITY-0592-15> 

JOINT  VENTURE  CO:  <ENTITY-0592-17> 

ACTIVITY:  <ACTIVITY-0592-9> 

<ENTITY-0592-15>  :  = 

NATIONALITY:  Tainan  (COUNTRY) 

TYPE:  COMPANY 

ENTITY  RELATIONSHIP:  <ENTITY_RELATI0HSHIP-0592-18> 
<ENTITY.RELATI0NSHIP-0592-18>  := 

ENTITY1 :  <ENTITY-0592-15> 

<EHTITY-0592-16> 

ENTITY2 :  <ENTITY-0592-17> 

REL  OF  ENTITY2  TO  ENTITY1 :  CHILD 
STATUS:  CURRENT 
< ENTITY-0592- 16>  := 

NAME:  Union  Precision  Casting  CO 
ALIASES:  "Union  Precision  Casting" 

TYPE:  COMPANY 

ENTITY  RELATIONSHIP:  <ENTITY_RELATI0NSHIP-O592-18> 
<ENTITY-0592-17>  := 

NATIONALITY:  Taiwan  (COUNTRY) 

TYPE:  COMPANY 

ENTITY  RELATIONSHIP:  <ENTITY_RELATI0NSHIP-O592-18> 
<TIE_UP_RELATI0NSHIP-O592-4>  := 

TIE-UP  STATUS:  EXISTING 
ENTITY:  <ENTITY-0592-22> 

<ENTITY-0592-21> 

ACTIVITY:  <ACTIVITY-0592-9> 

<ENTITY-0592-21>  := 

TYPE:  COMPANY 

ENTITY  RELATIONSHIP:  <EHTITY_RELATI0NSHIP-0592-23> 
<ENTITY_RELATIONSH IP-0592- 23>  := 

ENTITY1 :  <EHTITY-0S92-21> 

<EHTITY-0S92-22> 

REL  OF  EHTITY2  TO  ENTITY1 :  PARTNER 
STATUS:  CURRENT 
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<EITITY-0S92-22>  := 

■ATIOIALITY :  Japan  (COUITRY) 

TYPE:  COHPAIY 

EITITY  RELATIOISHIP :  <EITITY_RELATI0ISHIP-O592-23> 


C  jjv-0002  SRA’s  Original  Response 


<XX7V—  K -0002-1  >  := 

15*^0002 

m&ft  0:850108 

mr 

<J^§-0002-3> 

<H$|-0002-2> 

<^i-0002-2>  := 

xyf  4X4 — :  <X>'T  4X4  — 0002-4> 
<lt^iM^0002-5> 

<g&*SIS-0002-5>  := 

SB:  <|§g-0002-6> 

Wr-  (-  <xyf  4X4  — 0002-7>) 

<IH-0002-6>  := 

*jlj:  tf-t'X 

<X>f- 4X4  — 0002 -4>  := 

xyx4x4-£::fcftiE# 

4  X  4  —  S'J: 

xy-f  -  4  x  4  — RH&  <x  y-f  -  4  x  4  -M£-ooo2-8> 

<xyf  -ff-f  -»0002-8>  := 

xyf  4X4  —21:  <Xyf  4  x  4  — 0002-4> 
<t^i-0002-3>  := 

Jltf 

xyr  4X4—:  <xyr  ir-f  — ooo2-io> 

<^Si&-0002-5> 

<xyf  4x4  — 0002- 1  o> : = 
xyf  4x4  — £:  30rCr&± 
xyx4x4-53'J::£3? 

xyf  4X4  — xyf  4f4  — RH$-0002-11> 
<Xy-f  4  f  4  HSKR-0002-1 1>  := 

Xyf-  4  X  4  — Zl:  Xyf  4T  4  — 0002-10> 
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D  jjv-0002  SRA’s  Corrected  Response 


<T'yyV—  b-0002-l>  := 
l£V^0002 
m&R  0:850108 

-mam  m- 

I*]??:  <t@t-0002-4> 

<$g$|-0002-3> 

<i^i-0002-2> 

<JgJi-0002-2>  := 

tmm  Wff 

xyf  ■ff'f-:  <x>f  -f  X  4  — -0002-6> 

<X.yy  4T4  — 0002-5> 

<mmm-ooo2-i>  ■.= 

<^|-0002-8> 

(-  <xyf  — 0002-9>) 

<|gg-0002-8>  := 

mm'i  <r-  tr* 

<xyf  if  ■(  — 0002-5>  := 

jo.yf-  -rf x  —&■  iljHliE# 

xyf  -r  x  -f  —  <xyf  -(f-f  -M^-ooo2-io> 
<xyf  -fX'i’  -M^-0002-10>  := 
xyf  4X4  —  Zs-  <X  yf  4X4  — 0002-5> 

<X>  x  4x4  — -0002-6> 

<X>T  4X4  — -0002-6>  := 
xyf  4  x  4 
xyf^f^S'lM 

x>f'ff^  HffH&  <xyf  4X4  -»ooo2-io> 

<^-0002-3>  := 

WMIt 

X>x  x>f  4  X4  — 0002-I2> 

SSfi*  <^»CKK)2-7> 

<X>x  4X  4 — 0002- 1 2>  := 

xyf<f^-S'J:M 

xyf  -rr  4  HK&  <x  yf-  4  x  4  *-WGfrooo2-i  3> 

<X>f-  -»0002-13>  := 

X^x  4  X  4  — z.:  <X>--r  -ff-f  —-0002- 1 2> 

mz Mfc  Si-bi— 

mum 

<^}i-0002-4>  := 

mmKWr 

X>X  4  X  4  — :  <xyf  4X4  — 0002-15> 
tMim-  <&ffiM-0Q02-i> 


xyf  'fT'f  — -0002-15>  := 

S'J£:  "jfDi$S±” 
x2/x  4X4  S*J;  iiiSt 

xyx  -fx-t  — fSft&  <xyx  4  x  4  — HfW&ooo2-i 6> 
<X>x  -fT<  — [Rjf^-0002 - 1 6>  := 
x>x  — Z.:  <iyf  4  X  4  — -0002-15> 


220 


